Channel and spatial attention mechanism has proven to provide an evident performance boost of deep convolution neural networks (CNNs). Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the feature interaction between the two types of attention, we propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions based on learned Traits. Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules to fit different image hierarchies and tasks better. Moreover, we propose the global entropy pooling (GEP) apart from global average pooling (GAP) and global maximum pooling (GMP) operators, an effective component in suppressing noise signals by measuring the information disorder of feature maps. We introduce a three-way pooling operation into attention modules and apply the adaptive mechanism to fuse their outcomes. Extensive experiments on MS COCO, Pascal-VOC, Cifar-100, and ImageNet show that our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification. The model and code will be released soon.
translated by 谷歌翻译
Data uncertainty is commonly observed in the images for face recognition (FR). However, deep learning algorithms often make predictions with high confidence even for uncertain or irrelevant inputs. Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples. Taking a probabilistic view of the current classification model, the temperature scalar is exactly the scale of uncertainty noise implicitly added in the softmax function. Meanwhile, the uncertainty of images in a dataset should follow a prior distribution. Based on the observation, a unified framework for uncertainty modeling and FR, Random Temperature Scaling (RTS), is proposed to learn a reliable FR algorithm. The benefits of RTS are two-fold. (1) In the training phase, it can adjust the learning strength of clean and noisy samples for stability and accuracy. (2) In the test phase, it can provide a score of confidence to detect uncertain, low-quality and even OOD samples, without training on extra labels. Extensive experiments on FR benchmarks demonstrate that the magnitude of variance in RTS, which serves as an OOD detection metric, is closely related to the uncertainty of the input image. RTS can achieve top performance on both the FR and OOD detection tasks. Moreover, the model trained with RTS can perform robustly on datasets with noise. The proposed module is light-weight and only adds negligible computation cost to the model.
translated by 谷歌翻译
本文介绍了我们对第四次情感行为分析(ABAW)竞争的多任务学习(MTL)挑战的提交。基于视觉功能表示,我们利用三种类型的时间编码器来捕获视频中的时间上下文信息,包括基于变压器的编码器,基于LSTM的编码器和基于GRU的编码器。使用时间上下文感知表示,我们采用多任务框架来预测图像的价,唤醒,表达和AU值。此外,将平滑处理用于完善初始价和唤醒预测,并使用模型集成策略来结合不同模型设置的多个结果。我们的系统在MTL挑战验证数据集上实现了$ 1.742 $的性能。
translated by 谷歌翻译
(源)代码摘要旨在以自然语言的形式自动为给定代码段生成摘要/注释。此类摘要在帮助开发人员理解和维护源代码方面起着关键作用。现有的代码摘要技术可以分类为提取方法和抽象方法。提取方法使用检索技术从代码段中提取重要语句和关键字的子集,并生成一个摘要,该摘要保留了重要语句和关键字中的事实详细信息。但是,这样的子集可能会错过标识符或实体命名,因此,产生的摘要的自然性通常很差。抽象方法可以生成类似人写的摘要,从而利用神经机器翻译域的编码器模型。然而,生成的摘要通常会错过重要的事实细节。为了通过保留的事实细节生成类似人写的摘要,我们提出了一个新颖的提取和吸收框架。框架中的提取模块执行了提取代码摘要的任务,该任务列入了代码段,并预测包含关键事实细节的重要陈述。框架中的抽象模块执行了抽象代码摘要的任务,该任务是在整个代码段和并行的重要陈述中进行的,并生成了简洁而人工写的类似的自然语言摘要。我们通过在涉及六种编程语言的三个数据集上进行广泛的实验来评估称为EACS的有效性。实验结果表明,在所有三种广泛使用的指标(包括BLEU,流星和Rough-l)方面,EACS明显优于最先进的技术。
translated by 谷歌翻译
如今,在人员重新识别(Reid)任务的真实数据面临隐私问题,例如,禁止DataSet Dukemtmc-Reid。因此,收集Reid任务的真实数据变得更难。同时,标签的劳动力成本仍然很高,进一步阻碍了Reid研究的发展。因此,许多方法转向为REID算法生成合成图像作为替代方而不是真实图像。然而,合成和真实图像之间存在不可避免的领域差距。在以前的方法中,生成过程基于虚拟场景,并且无法根据不同的目标实际场景自动更改其合成训练数据。为了处理这个问题,我们提出了一种新颖的目标感知一代管道,以产生称为Tagerson的合成人物图像。具体地,它涉及参数化渲染方法,其中参数是可控的,并且可以根据目标场景调整。在Tagperson中,我们从目标场景中提取信息,并使用它们来控制我们的参数化渲染过程以生成目标感知的合成图像,这将使目标域中的实图像保持较小的间隙。在我们的实验中,我们的目标感知的合成图像可以实现比MSMT17上的广义合成图像更高的性能,即秩1精度的47.5%与40.9%。我们将发布此工具包\脚注{\ noindent代码可用于\ href {https://github.com/tagperson/tagperson-blender} {https://github.com/tagperson/tagperson -brender}}为Reid社区以任何所需味道产生合成图像。
translated by 谷歌翻译
随着深度学习技术的快速发展,各种最近的工作试图应用图形神经网络(GNN)来解决诸如布尔满足(SAT)之类的NP硬问题,这表明了桥接机器学习与象征性差距的潜力。然而,GNN预测的解决方案的质量并未在文献中进行很好地研究。在本文中,我们研究了GNNS在学习中解决最大可满足性(MaxSAT)问题的能力,从理论和实践角度来看。我们构建了两种GNN模型来学习来自基准的MaxSAT实例的解决方案,并显示GNN通过实验评估解决MaxSAT问题的有吸引力。我们还基于算法对准理论,我们还提出了GNNS可以在一定程度上学会解决MaxSAT问题的影响的理论解释。
translated by 谷歌翻译
Future work sentences (FWS) are the particular sentences in academic papers that contain the author's description of their proposed follow-up research direction. This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content. FWS recognition methods will enable subsequent researchers to locate future work sentences more accurately and quickly and reduce the time and cost of acquiring the corpus. The current work on automatic identification of future work sentences is relatively small, and the existing research cannot accurately identify FWS from academic papers, and thus cannot conduct data mining on a large scale. Furthermore, there are many aspects to the content of future work, and the subdivision of the content is conducive to the analysis of specific development directions. In this paper, Nature Language Processing (NLP) is used as a case study, and FWS are extracted from academic papers and classified into different types. We manually build an annotated corpus with six different types of FWS. Then, automatic recognition and classification of FWS are implemented using machine learning models, and the performance of these models is compared based on the evaluation metrics. The results show that the Bernoulli Bayesian model has the best performance in the automatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERT model has the best performance in the automatic classification task, with the weighted average F1 reaching 72.63%. Finally, we extract keywords from FWS and gain a deep understanding of the key content described in FWS, and we also demonstrate that content determination in FWS will be reflected in the subsequent research work by measuring the similarity between future work sentences and the abstracts.
translated by 谷歌翻译
Users' involvement in creating and propagating news is a vital aspect of fake news detection in online social networks. Intuitively, credible users are more likely to share trustworthy news, while untrusted users have a higher probability of spreading untrustworthy news. In this paper, we construct a dual-layer graph (i.e., the news layer and the user layer) to extract multiple relations of news and users in social networks to derive rich information for detecting fake news. Based on the dual-layer graph, we propose a fake news detection model named Us-DeFake. It learns the propagation features of news in the news layer and the interaction features of users in the user layer. Through the inter-layer in the graph, Us-DeFake fuses the user signals that contain credibility information into the news features, to provide distinctive user-aware embeddings of news for fake news detection. The training process conducts on multiple dual-layer subgraphs obtained by a graph sampler to scale Us-DeFake in large scale social networks. Extensive experiments on real-world datasets illustrate the superiority of Us-DeFake which outperforms all baselines, and the users' credibility signals learned by interaction relation can notably improve the performance of our model.
translated by 谷歌翻译
Pre-trained language models have been successful in natural language generation (NLG) tasks. While various decoding methods have been employed, they often produce suboptimal results. We first present an empirical analysis of three NLG tasks: summarization, machine translation, and constrained text generation. We found that selecting the best output from the results of multiple decoding methods can significantly improve performance. To further improve reranking for NLG tasks, we proposed a novel method, \textsc{PairReranker}, which uses a single encoder and a pairwise loss function to jointly encode a source input and a pair of candidates and compare them. Experiments on three NLG tasks demonstrated the effectiveness and flexibility of \textsc{PairReranker}, showing strong results, compared with previous baselines. In addition, our \textsc{PairReranker} can generalize to significantly improve GPT-3 (text-davinci-003) results (e.g., 24.55\% on CommonGen and 11.35\% on WMT18 zh-en), even though our rerankers are not trained with any GPT-3 candidates.
translated by 谷歌翻译
The application of Natural Language Processing (NLP) to specialized domains, such as the law, has recently received a surge of interest. As many legal services rely on processing and analyzing large collections of documents, automating such tasks with NLP tools emerges as a key challenge. Many popular language models, such as BERT or RoBERTa, are general-purpose models, which have limitations on processing specialized legal terminology and syntax. In addition, legal documents may contain specialized vocabulary from other domains, such as medical terminology in personal injury text. Here, we propose LegalRelectra, a legal-domain language model that is trained on mixed-domain legal and medical corpora. We show that our model improves over general-domain and single-domain medical and legal language models when processing mixed-domain (personal injury) text. Our training architecture implements the Electra framework, but utilizes Reformer instead of BERT for its generator and discriminator. We show that this improves the model's performance on processing long passages and results in better long-range text comprehension.
translated by 谷歌翻译